Curvature of the energy landscape and folding of model proteins 
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■ We study the geometric properties of the energy landscape of coarse-grained, off-lattice models 

of polymers by endowing the configuration space with a suitable metric, depending on the potential 
, energy function, such that the dynamical trajectories are the geodesies of the metric. Using numer- 

. ical simulations, we show that the fluctuations of the curvature clearly mark the folding transition, 

and that this quantity allows to distinguish between polymers having a protein-like behavior (i.e., 
\l ' that fold to a unique configuration) and polymers which undergo a hydrophobic collapse but do not 

have a folding transition. These geometrical properties are defined by the potential energy without 
requiring any prior knowledge of the native configuration. 
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PACS numbers: 87.15.-v; 02.40.-k 



Protein folding is one of the most fundamental and challenging open questions in molecular biology. Proteins are 
polypeptides, i.e., polymers made of aminoacids, and since the pioneering experiments by Anfinsen and coworkers jj it 

, has been known that the sequence of aminoacids uniquely determines the native state, i.e., the compact configuration 

^ . the protein assumes in physiological conditions and which makes it able to perform its biological tasks To 
understand how the information contained in the sequence is translated into the three-dimensional native structure 
is the core of the protein folding problem, and its solution would allow one to predict a protein's structure from the 
sole knowledge of the aminoacid sequence: moreover, solving the protein folding problem would make it possible to 
''O I engineer proteins which fold to any given structure (what is commonly referred to as the inverse folding problem), 
Ph ' which in turn would mean a giant leap in drug design. Despite many remarkable advances in the last decades 0, the 
protein folding problem is still far from a solution. 

Within the folding problem, a basic issue stems from the observation that not all polypeptides are proteins: only 
a very small subset of all the possible sequences of the twenty naturally occurring aminoacids have been selected by 
evolution. According to our present knowledge, all the naturally selected proteins fold to a uniquely determined native 
, state, but a generic polypeptide does not. Then, what makes a protein different from a generic polypeptide? or, in 

■ other words, which are the properties a polypeptide must have to behave like a protein, i.e., to fold into a unique native 
state regardless of the initial conditions, when the environment is the correct one? Answering this question would 
not directly yield a solution of the folding problem, nonetheless it would indicate which are the minimal common 
properties of those polymers which fold like a protein. To this end, the energy landscape picture has emerged as 
crucial. Energy landscape, or more precisely potential energy landscape, is the name commonly given to the potential 
energy of interaction between the microscopic degrees of freedom of the system Q . Before having been applied to 

■ biomolecules, this concept has proven useful in the study of other complex systems, especially of supercooled liquids 
^ , and of the glass transition jjj . The basic idea is very simple, yet powerful: if a system has a rugged, complex energy 

landscape, with many minima and valleys separated by barriers of different height, it s dy namics will experience a 
variety of time scales, with oscillations in the valleys and jumps from one valley to another [2^. Then one can try to link 
special features of the behavior of the system (i.e., the presence of a glass transition or the separation of time scales) 
Q , to special properties of the landscape, like the topography of the basins around minima or the energy distribution of 
minima and saddles connecting them. Anyway, a complex landscape yields a complex dynamics, where the system 
is very likely to remain trapped in different valleys when the temperature is not so high. This is consistent with a 
glassy behavior, but a protein does not show a glassy behavior, it rather has relatively low frustration. This means 
that there must be some property of the landscape such to avoid too much frustration. This property is commonly 
referred to as the folding funnel ^ : though locally rugged, the low-energy part of the energy landscape is supposed 
to have an overall funnel shape so that most initial conditions are driven towards the correct native state. Moreover, 
the dynamics must do that efficiently, or the protein would not fold in reasonable times; in other words, it must be 
"sufficiently unstable" to make trapping in local minima very unlikely, and saddles must efficiently connect non-native 
minima with the native state. However, a direct visualization of the energy landscape is impossible due to its high 
dimensionality, and its detailed properties must be inferred indirectly. A possible strategy is a local one: one searches 
for the minima of the landscape and then for the saddles connecting different minima. Although straightforward 
in principle, this is practically unfeasible for accurate all-atom potential energies, but may become accessible for 
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minimalistic potentials|23- Minimalistic models are those where the polymer is described at a coarse-grained level, 
as a chain of N beads where N is the number of aminoacids; no explicit water molecules are considered and the 
solvent is taken into account only by means of effective interactions among the monomers. Minimalistic models can 
be relatively simple, yet in some cases yield very accurate results which compare well with experiments |8|| . The local 
properties of the energy landscape of minimahstic models have been recently studied (see e.g. Refs. [ol llOl ITU ) and 
very interesting clues about the structure of the folding funnel and the differences between protein-like heteropolymers 
and other polymers have been found: in particular, it has been shown that a funnel-like structure is present also in 
homopolymers, but what makes a big difference is that in protein-like systems jumps between minima corresponding 
to distant configurations are much more favoured dynamically 0| . 

The above mentioned local strategy to analyze energy landscapes requires however a huge computational effort if 
one wants to obtain a good sampling. So the following question naturally arises: is there some global property of the 
energy landscape which can be easily computed numerically as an average along dynamical trajectories and which is 
able to identify polymers having a protein-like behavior? The main issue of the present Letter is to show that such 
a quantity indeed exists, at least for the minimalistic model we considered, and that it is of a geometric nature. In 
particular, we will show that the fluctuations of a suitably defined curvature of the energy landscape clearly mark 
the folding transition while do not show any remarkable feature when the polymer undergoes a hydrophobic collapse 
without a preferred native state. This is at variance with thermodynamic global observables, like the specific heat, 
which show a very similar behavior in the case of a folding transition and of a simple hydrophobic collapse. 

It is a classic result of analytical dynamics that the stability properties of the trajectories of a dynamical system 
are completely determined by the curvature of a suitable manifold, i.e., of the configuratio n sp ace endowed with 
a metric tensor g depending on the potential energy V{qi, . . . ,qN) such that its geodesies j24j coincide with the 
dynamical trajectories jl3j |. Locally, a positive curvature implies stability, while negative curvatures are associated to 
instability: accordingly, the metric g is such that close to minima of V the curvature is positive, while saddles have 
negative curvatures, at least along some direction. However, instability can be generated also by the bumpiness of 
the manifold: if the curvature fluctuates along a geodesic it may destabilize it even without assuming negative values, 
the degree of instability being related to the size of the fluctuations (see Ref. for a review) . 

The geometrization of the dynamics is not unique: a particularly convenient procedure was introduced by 
Eisenhart 0| by considering an enlarged {N + 2)-dimensional configuration space. In terms of the coordinates 
q'^,q^, . . . ,q^~^^, where q^,...,q^ are the lagrangian coordinates and g° and two extra coordinates, the 

nonzero components of the Eisenhart metric tensor are (we set the masses of the particles equal to 1 for simplicity) 
goQ = —2V{q) and gu — go jv+i = g^^iQ — 1 (z = 1, . . . , N); one can prove that the geodesies of the Eisenhart metric 
project onto dynamical trajectories. The mathematical object which contains all the information on the curvature 
is the curvature tensor R |l2l |: in the case of the Eisenhart metric it turns out to be very simple, for its nonzero 
components are given by the Hessian of the potential V , Roioj = didjV. The curvature of the Eisenhart metric is 
then just the curvature of the energy landscape itself, as a function of (g^, . . . ,9^). The quantity which actually 
determines the stability properties of a geodesic of velocity v (in a given direction w ± t;) is the sectional curvature 
K{v,w) = Rijkiv^w-'v'^w^ /IwAwp [13. In N dimensions there are iV — 1 independent ?« directions, nonetheless the 
most important information is already contained in the average of K{v, w) over the — 1 possible directions of w |l5l| . 
This scalar quantity is called the Ricci curvature and is given by Kr{v) = Rijv^v^ , where Rij — R^f^j — g^^Rukj are 
the components of the Ricci tensor In the case of the Eisenhart metric, the Ricci curvature along the direction of 
the velocity vector (i.e., the Ricci curvature "felt" by the system during its motion, and which we will refer to simply 
as Kn dropping the dependence on v) is nothing but the Laplacian of the potential J^, 

Kr^AV , (1) 

i.e., the average curvature of the energy landscape. We may then expect that the statistical distribution of Kr on 
the configuration space contains relevant information on the stability of a generic trajectory: such an observable is 
then a good candidate for a global quantity able to catch some of the features of the landscape which characterize a 
protein-like behavior. 

We sampled the value of the Ricci curvature Kr along the dynamical trajectories of a minimalistic model originally 
introduced by Thirumalai and coworkers 17], a three-dimensional off-lattice model of a polypeptide which has only 
three different kinds of aminoacids: polar (P), hydrophobic (H) and neutral (N). The potential energy is 



V = VYi + VK + VY, + V^B 



(2) 
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where 
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i=l 

N-2 



- 5]^(|^.-^,-i|-^of; (4) 
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i=l 

N-3 



Vb ^ J2 {^^[1 + ^o^^*] + -^'[1 + cos(3V'*)]} ; (5) 



i=l 

N-3 N 



^ (6) 

i—l j—i+3 

fi is the position vector of the i-th monomer, fi_j — ri — fj, is the i-th bond angle, i.e., the angle between f^+i and 
r^, tpi the i-th dihedral angle, that is the angle between the vectors fii = ri+i^i x ri+i_i+2 and n^+i = ?^i+2,i+i x ^i+2,i+3, 
fcr = 100, a — 1, — 20, ?9o = 105°, = and i?i = 0.2 if at least two among the residues i. z + 1, « + 2, i + 3 are N, 
A,^B,^ 1.2 otherwise. As to F,,, we have V,, = | [(f)'' + (f)'] if i,j = P,P or z,j = P,H, = 4 [(f)'' - (f)' 

if i, j = H, H and V.^j = 4 (f )'' if either i or j are N U^. 

Although the identity between dynamical trajectories and geodesies of the Eisenhart metric only holds if the 
dynamics is the Newtonian one, a Langevin dynamics, obtained by adding to the deterministic force a random 
force according to the fluctuation-dissipation theorem and a friction term proportional to the velocity, is a more 
reasonable model of the dynamics of a polymer in aqueous solution when the solvent degrees of freedom are not taken 
into account explicitly. Since we are interested not in the details of the time series of Kn along a particular trajectory 
but only in its statistical distribution, we may expect that also a sampling obtained using the Langevin dynamics 
gives the same information on the geometry of the landscape. To check this assumption we let the system evolve with 
both a newtonian dynamics (using a symplectic algorithm [l^ to integrate the equations of motion) and a Langevin 
dynamics (using the same algorithm ~ a modified Verlet - and parameters as in Ref. ^3) obtaining very similar 
results in the two cases. In the following we shall refer only to results obtained with Langevin dynamics. 

We considered five difl^erent sequences: four of 22 monomers S"^ = PH9(NP)2NHPH3PH, S"^ = 
PHNPH3NHNH4(PH2)2PH, 5'P = P4H5NHN2H6P3, = H22 and also a homopolymeric sequence of 44 monomers 
= H44. Sequence had already been identified as a good folder and our simulations confirmed this finding: 
below a given temperature it always reached the same /3-sheet- like structure. Homopolymers S^-^ and S^^, on the 
other hand, showed a hydrophobic collapse but no tendency to reach a particular configuration in the collapsed phase. 
Sequence (which has the same overall composition of S'^ rearranged in a different sequence) behaved as a bad 
folder and did not reach a unique native state, while Sf"^ was constructed by us to show a somehow intermediate 
behavior between good and bad folders: it always formed the same structure involving the middle of the sequence, 
while the beginning and the end of the chain fluctuated also at low temperature. As to standard thermodynamic 
observables, all the sequences showed very similar behaviors: in particular, both the specific heat cy of the homopoly- 
mer and of the good folder S"^ exhibit a peak at the transition (data not shown), and on the sole basis of this 
quantity it would be hard to discriminate between a simple hydrophobic collapse and a folding. 

On the other hand, a dramatic difference between the homopolymer and the good folder shows up if we consider the 
geometric properties of the landscape, and in particular the fluctuations of the Ricci curvature Kr (Q). We defined a 
relative adimensional curvature fluctuation cr as 



" TTT^^ (7) 

where (■)t stands for a time average: in Fig. ^ we plot cr as a function of the temperature T for the homopolymer 
S'^ and for the good folder S'^ . A peak shows up in the case of the good folder, close to the folding temperature T/ 
(which we estimated as Tf = 0.6 ± 0.05), below which the system is mostly in the native state, while no particular 
mark of the hydrophobic collapse can be seen in the case of the homopolymer. As to the other sequences, for the 
longer homopolymer cr(r) is even smoother than for S"^ , at variance with the specific heat which develops a 
sharper peak consistently with the presence of a thermodynamic ^-transition as iV — > 00 (data not shown); for the 
bad folder S"^ , <^{T) is not as smooth as for the homopolymers, but only a very weak signal is found at a lower 
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temperature than that of the peak in cy, i.e., at the temperature where the system starts to behave as a glass; for 
the "intermediate" sequence S'P a peak is present at the "quasi-folding" temperature, although considerably broader 
than in the case of S^'^ (data not shown). 

The behavior of <t{T) can thus be used to mark the folding transition and to identify good folders within the 
model considered here. It must be stressed that no knowledge of the native state is necessary to define cr, and that 
it can be computed with the same computational effort needed to obtain the specific heat and other thermodynamic 
observables. 

Which is the origin of this behavior of cr(T)? While we do not have a complete answer yet, we argue it is a 
consequence of the effective two-state dynamics of this system close to the folding transition: an average curvature 
in the folded state considerably larger than in the denatured state (due to the more pronounced effective potential 
well in the native state) would naturally imply a sudden increase of the fluctuations of the curvature as the system 
approaches the folding transition. Moreover, higher fluctuations imply a higher degree of instability of the dynamics, 
as is expected close to Tf where the polymer has essentially the same probability of being folded or swollen. This 
result also opens a connection between the folding transition and symmetry-breaking phase transitions: the behavior 
of cr(r) observed here for the good folder S'^'^ is remarkably close to that exhibited by finite systems undergoing a 
symmetry-breaking phase transition in the thermodynamic limit [l9| . This suggests that the folding of a proteinlike 
heteropolymer does share some features of "true" symmetry-breaking phase transitions, at least those that show up 
already in finite s yste ms, although no singularity in the thermodynamic limit occurs, because proteins are intrinsically 
finite objects [Ullll. 

To summarize, we have shown that the geometry of the energy landscape, and in particular the fluctuations a of 
its curvature, can be used to mark the folding transition and to identify polymers having a protein-like behavior, in 
the context of a minimalistic model. If tested successfully on other, maybe more refined models of proteins, cr might 
prove a useful tool in the search of protein-like sequences. The geometric nature of a may provide an insight into the 
nature of the folding transition itself and suggests a connection with symmetry-breaking phase transitions. 
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FP6 EMBIO project (EC contract n. 012835). 
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